Fix manual labels for KMeans representative digits (was causing ~7% accuracy) semi-supervised learning example #202
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While going through the semi-supervised learning example in the unsupervised learning section, I initially got a slightly different accuracy when I manually labeled the representative digits based on what I actually saw in the images.
I initially thought it was an issue in my code, but to verify, I ran the original notebook as-is on Google Colab — and to my surprise, the model's accuracy was just 7%.
After digging into the code, I found the problem:
The hardcoded labels for the 50 representative digits (y_representative_digits) no longer match the current cluster centroids generated by KMeans. This is likely due to internal changes in the dataset order or scikit-learn's clustering behavior (like randomness in centroid initialization or data shuffling).
Because of this mismatch, the model was being trained on incorrect image-label pairs, leading to terrible accuracy.
Fix:
Replaced outdated y_representative_digits with correct labels (manually reassigned by inspecting the actual centroids).
Note:
I also have an earlier PR open
#196
Kindly review that one as well.